sparse sub-network
- North America > United States (0.04)
- Asia > China > Tianjin Province > Tianjin (0.04)
- Contests & Prizes (0.71)
- Research Report > New Finding (0.48)
Distributionally Robust Ensemble of Lottery Tickets Towards Calibrated Sparse Network Training
The recently developed sparse network training methods, such as Lottery Ticket Hypothesis (LTH) and its variants, have shown impressive learning capacity by finding sparse sub-networks from a dense one. While these methods could largely sparsify deep networks, they generally focus more on realizing comparable accuracy to dense counterparts yet neglect network calibration. However, how to achieve calibrated network predictions lies at the core of improving model reliability, especially when it comes to addressing the overconfident issue and out-of-distribution cases. In this study, we propose a novel Distributionally Robust Optimization (DRO) framework to achieve an ensemble of lottery tickets towards calibrated network sparsification. Specifically, the proposed DRO ensemble aims to learn multiple diverse and complementary sparse sub-networks (tickets) with the guidance of uncertainty sets, which encourage tickets to gradually capture different data distributions from easy to hard and naturally complement each other. We theoretically justify the strong calibration performance by showing how the proposed robust training process guarantees to lower the confidence of incorrect predictions. Extensive experimental results on several benchmarks show that our proposed lottery ticket ensemble leads to a clear calibration improvement without sacrificing accuracy and burdening inference costs. Furthermore, experiments on OOD datasets demonstrate the robustness of our approach in the open-set environment.
- North America > United States (0.04)
- Asia > China > Tianjin Province > Tianjin (0.04)
- Contests & Prizes (0.71)
- Research Report > New Finding (0.48)
Mastering Continual Reinforcement Learning through Fine-Grained Sparse Network Allocation and Dormant Neuron Exploration
Zheng, Chengqi, Yin, Haiyan, Chen, Jianda, Ng, Terence, Ong, Yew-Soon, Tsang, Ivor
Continual Reinforcement Learning (CRL) is essential for developing agents that can learn, adapt, and accumulate knowledge over time. However, a fundamental challenge persists as agents must strike a delicate balance between plasticity, which enables rapid skill acquisition, and stability, which ensures long-term knowledge retention while preventing catastrophic forgetting. In this paper, we introduce SSDE, a novel structure-based approach that enhances plasticity through a fine-grained allocation strategy with Structured Sparsity and Dormant-guided Exploration. SSDE decomposes the parameter space into forward-transfer (frozen) parameters and task-specific (trainable) parameters. Crucially, these parameters are allocated by an efficient co-allocation scheme under sparse coding, ensuring sufficient trainable capacity for new tasks while promoting efficient forward transfer through frozen parameters. However, structure-based methods often suffer from rigidity due to the accumulation of non-trainable parameters, limiting exploration and adaptability. To address this, we further introduce a sensitivity-guided neuron reactivation mechanism that systematically identifies and resets dormant neurons, which exhibit minimal influence in the sparse policy network during inference. This approach effectively enhance exploration while preserving structural efficiency. Extensive experiments on the CW10-v1 Continual World benchmark demonstrate that SSDE achieves state-of-the-art performance, reaching a success rate of 95%, surpassing prior methods significantly in both plasticity and stability trade-offs (code is available at: https://github.com/chengqiArchy/SSDE).
- North America > Canada > Quebec > Montreal (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (10 more...)
Random Search as a Baseline for Sparse Neural Network Architecture Search
Overparameterized neural networks are loosely characterized as networks that have a very high fitting (or memorization) capacity with respect to their training data. Although capable of memorization of their training data, these networks intriguingly achieve very low test error close to their training error rates [1, 2]. Meanwhile, sparse neural networks have shown similar or better generalization performance than their dense counterparts while having higher parameter efficiency [3]. With increasing availability of hardware and software that support sparse computational operations [4, 5], there has been a growing interest in finding sparse sub-networks within large overparameterized models to either improve generalization performance or to gain computational efficiency at the same performance level [6, 7, 8, 3]. Earlier works on creating efficient sparse sub-networks include the now popular pruning technique [9]. These were motivated by the desire to achieve compute efficiency in resource constraint applications by finding smaller networks within a larger network space without losing task performance quality [10]. The original pruning technique involves fully training a larger network on some task, discarding the task-irrelevant connections, and then fine-tuning the remaining sparse sub-network on the task to achieve the a level of performance near that of the larger network. Connections were originally pruned based on loss Hessians [9, 11]. Later on, other techniques were proposed such as the removal of weak connections [12] based on weight value thresholds.